26 research outputs found

    MultiBaC: an R package to remove batch effects in multi-omic experiments

    Get PDF
    Motivation: Batch effects in omics datasets are usually a source of technical noise that masks the biological signal and hampers data analysis. Batch effect removal has been widely addressed for individual omics technologies. However, multi-omic datasets may combine data obtained in different batches where omics type and batch are often confounded. Moreover, systematic biases may be introduced without notice during data acquisition, which creates a hidden batch effect. Current methods fail to address batch effect correction in these cases. Results: In this article, we introduce the MultiBaC R package, a tool for batch effect removal in multi-omics and hidden batch effect scenarios. The package includes a diversity of graphical outputs for model validation and assessment of the batch effect correction. Availability and implementation: MultiBaC package is available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/MultiBaC.html) and GitHub (https://github.com/ConesaLab/MultiBaC.git). The data underlying this article are available in Gene Expression Omnibus repository (accession numbers GSE11521, GSE1002, GSE56622 and GSE43747).This work was funded by the Generalitat Valenciana through PROMETEO grants program for excellence research groups [PROMETEO 2016/093] and by the Spanish MICINN [PID2020-119537RB-I00]. Funding for open access charge: Universitat Politècnica de València

    Soledad, salud mental y la COVID-19 en la población española

    Get PDF
    The study aim was to assess the effects of the health emergency and the stay-at-home restrictions on loneliness variables in the Spanish population during the initial stage of COVID-19. A cross-sectional study was conducted through an online survey of 3480 people. From March 14, 2020, screening tests were used to evaluate sociodemographic and COVID-19-related data on loneliness, social support, the presence of mental health symptoms, discrimination, and spiritual well-being. Descriptive analyses were conducted and linear regression models were constructed. A negative association was found between loneliness and being older, being partnered, having children, being a university graduate, being retired or still working, having stronger religious beliefs, believing that information provided about the pandemic was adequate, having social support, and having self-compassion. Actions that promote social support and further studies on loneliness in groups of older people are needed to prevent the pandemic having a stronger impact on mental health and well-being.El objetivo de este estudio fue evaluar los efectos de la emergencia sanitaria y el confinamiento de la primera oleada de COVID-19 sobre las variables de soledad en la población española. Se realizó un estudio transversal mediante una encuesta online a 3480 personas. Se evaluaron datos sociodemográficos y relacionados con la COVID-19 sobre la soledad, el apoyo social, la presencia de síntomas de salud mental, la discriminación y el bienestar espiritual mediante pruebas de detección a partir del 14 de marzo. Se realizaron análisis descriptivos y se elaboraron modelos de regresión lineal. Pertenecer al grupo de mayor edad, vivir en pareja, tener hijos y estudios universitarios, estar jubilado o seguir trabajando, valorar bastante la religión, creer que se había proporcionado información adecuada sobre la pandemia, tener apoyo social y la autocompasión se relacionaron negativamente con la soledad. Son necesarias acciones que promuevan el apoyo social, así como un mayor estudio de la soledad en grupos de personas mayores, para evitar un mayor impacto de la pandemia en nuestra salud mental y bienestar

    MultiBaC: A strategy to remove batch effects between different omic data types

    Full text link
    [EN] Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform-i.e. gene expression- is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is part of a research project that is totally funded by Conselleria d'Educacio, Cultura i Esport (Generalitat Valenciana) through PROMETEO grants program for excellence research groups.Ugidos, M.; Tarazona Campos, S.; Prats-Montalbán, JM.; Ferrer, A.; Conesa, A. (2020). MultiBaC: A strategy to remove batch effects between different omic data types. Statistical Methods in Medical Research. 29(10):2851-2864. https://doi.org/10.1177/0962280220907365S285128642910Kupfer, P., Guthke, R., Pohlers, D., Huber, R., Koczan, D., & Kinne, R. W. (2012). Batch correction of microarray data substantially improves the identification of genes differentially expressed in Rheumatoid Arthritis and Osteoarthritis. BMC Medical Genomics, 5(1). doi:10.1186/1755-8794-5-23Gregori, J., Villarreal, L., Méndez, O., Sánchez, A., Baselga, J., & Villanueva, J. (2012). Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. Journal of Proteomics, 75(13), 3938-3951. doi:10.1016/j.jprot.2012.05.005Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47-e47. doi:10.1093/nar/gkv007Gagnon-Bartsch, J. A., & Speed, T. P. (2011). Using control genes to correct for unwanted variation in microarray data. Biostatistics, 13(3), 539-552. doi:10.1093/biostatistics/kxr034Nueda, M. j., Ferrer, A., & Conesa, A. (2011). ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics, 13(3), 553-566. doi:10.1093/biostatistics/kxr042Jansen, J. J., Hoefsloot, H. C. J., van der Greef, J., Timmerman, M. E., Westerhuis, J. A., & Smilde, A. K. (2005). ASCA: analysis of multivariate data obtained from an experimental design. Journal of Chemometrics, 19(9), 469-481. doi:10.1002/cem.952Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C. J., Smilde, A. K., Talón, M., & Ferrer, A. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA. Bioinformatics, 23(14), 1792-1800. doi:10.1093/bioinformatics/btm251Giordan, M. (2013). A Two-Stage Procedure for the Removal of Batch Effects in Microarray Studies. Statistics in Biosciences, 6(1), 73-84. doi:10.1007/s12561-013-9081-1Nyamundanda, G., Poudel, P., Patil, Y., & Sadanandam, A. (2017). A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies. Scientific Reports, 7(1). doi:10.1038/s41598-017-11110-6Reese, S. E., Archer, K. J., Therneau, T. M., Atkinson, E. J., Vachon, C. M., de Andrade, M., … Eckel-Passow, J. E. (2013). A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics, 29(22), 2877-2883. doi:10.1093/bioinformatics/btt480Papiez, A., Marczyk, M., Polanska, J., & Polanski, A. (2018). BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm. Bioinformatics, 35(11), 1885-1892. doi:10.1093/bioinformatics/bty900Keel, B. N., Zarek, C. M., Keele, J. W., Kuehn, L. A., Snelling, W. M., Oliver, W. T., … Lindholm-Perry, A. K. (2018). RNA-Seq Meta-analysis identifies genes in skeletal muscle associated with gain and intake across a multi-season study of crossbred beef steers. BMC Genomics, 19(1). doi:10.1186/s12864-018-4769-8Li, M. D., Burns, T. C., Morgan, A. A., & Khatri, P. (2014). Integrated multi-cohort transcriptional meta-analysis of neurodegenerative diseases. Acta Neuropathologica Communications, 2(1). doi:10.1186/s40478-014-0093-yAndres-Terre, M., McGuire, H. M., Pouliot, Y., Bongen, E., Sweeney, T. E., Tato, C. M., & Khatri, P. (2015). Integrated, Multi-cohort Analysis Identifies Conserved Transcriptional Signatures across Multiple Respiratory Viruses. Immunity, 43(6), 1199-1211. doi:10.1016/j.immuni.2015.11.003Sandhu, V., Labori, K. J., Borgida, A., Lungu, I., Bartlett, J., Hafezi-Bakhtiari, S., … Haibe-Kains, B. (2019). Meta-Analysis of 1,200 Transcriptomic Profiles Identifies a Prognostic Model for Pancreatic Ductal Adenocarcinoma. JCO Clinical Cancer Informatics, (3), 1-16. doi:10.1200/cci.18.00102Huang, H., Liu, C.-C., & Zhou, X. J. (2010). Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proceedings of the National Academy of Sciences, 107(15), 6823-6828. doi:10.1073/pnas.0912043107Pelechano, V., & Pérez-Ortín, J. E. (2010). There is a steady-state transcriptome in exponentially growing yeast cells. Yeast, 27(7), 413-422. doi:10.1002/yea.1768Garcı́a-Martı́nez, J., Aranda, A., & Pérez-Ortı́n, J. E. (2004). Genomic Run-On Evaluates Transcription Rates for All Yeast Genes and Identifies Gene Regulatory Mechanisms. Molecular Cell, 15(2), 303-313. doi:10.1016/j.molcel.2004.06.004Pelechano, V., Chávez, S., & Pérez-Ortín, J. E. (2010). A Complete Set of Nascent Transcription Rates for Yeast Genes. PLoS ONE, 5(11), e15442. doi:10.1371/journal.pone.0015442Zid, B. M., & O’Shea, E. K. (2014). Promoter sequences direct cytoplasmic localization and translation of mRNAs during starvation in yeast. Nature, 514(7520), 117-121. doi:10.1038/nature13578Freeberg, M. A., Han, T., Moresco, J. J., Kong, A., Yang, Y.-C., Lu, Z., … Kim, J. K. (2013). Pervasive and dynamic protein binding sites of the mRNA transcriptome in Saccharomyces cerevisiae. Genome Biology, 14(2), R13. doi:10.1186/gb-2013-14-2-r13McKinlay, A., Araya, C. L., & Fields, S. (2011). Genome-Wide Analysis of Nascent Transcription in Saccharomyces cerevisiae. G3 Genes|Genomes|Genetics, 1(7), 549-558. doi:10.1534/g3.111.000810Castells-Roca, L., García-Martínez, J., Moreno, J., Herrero, E., Bellí, G., & Pérez-Ortín, J. E. (2011). Heat Shock Response in Yeast Involves Changes in Both Transcription Rates and mRNA Stabilities. PLoS ONE, 6(2), e17272. doi:10.1371/journal.pone.0017272Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109-130. doi:10.1016/s0169-7439(01)00155-1Folch-Fortuny, A., Vitale, R., de Noord, O. E., & Ferrer, A. (2017). Calibration transfer between NIR spectrometers: New proposals and a comparative study. Journal of Chemometrics, 31(3), e2874. doi:10.1002/cem.2874García Muñoz, S., MacGregor, J. F., & Kourti, T. (2005). Product transfer between sites using Joint-Y PLS. Chemometrics and Intelligent Laboratory Systems, 79(1-2), 101-114. doi:10.1016/j.chemolab.2005.04.009Andrade, J. M., Gómez-Carracedo, M. P., Krzanowski, W., & Kubista, M. (2004). Procrustes rotation in analytical chemistry, a tutorial. Chemometrics and Intelligent Laboratory Systems, 72(2), 123-132. doi:10.1016/j.chemolab.2004.01.007Hurley, J. R., & Cattell, R. B. (2007). The procrustes program: Producing direct rotation to test a hypothesized factor structure. Behavioral Science, 7(2), 258-262. doi:10.1002/bs.3830070216Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics, 28(1), 100. doi:10.2307/234683

    In-hospital postoperative infection after heart transplantation: risk factors and development of a novel predictive score

    Get PDF
    [Abstract] Introduction: Infection is one of the most significant complications following heart transplantation (HT). The aim of this study was to identify specific risk factors for early postoperative infections in HT recipients, and to develop a multivariable predictive model to identify HT recipients at high risk. Methods: A single-center, observational, and retrospective study was conducted. The dependent variable was in-hospital postoperative infection. We examined demographic and epidemiological data from donors and recipients, surgical features, and adverse postoperative events as independent variables. Backwards, stepwise multivariable logistic regression with a P-value < 0.05 was used to identify clinical factors independently associated with the risk of in-hospital postoperative infections following HT. Results: Six hundred seventy-seven patients were included in this study. During the in-hospital postoperative period, 348 episodes of infection were diagnosed in 239 (35.9%) patients. Seven variables were identified as independent clinical predictors of early postoperative infection after HT: history of diabetes mellitus, previous sternotomy, preoperative mechanical ventilation, primary graft failure, major surgical bleeding, use of mycophenolate mofetil, and use of itraconazole. Based on the results of multivariable models, we constructed a 7-variable (8-point) score to predict the risk of in-hospital postoperative infection in HT recipients, which showed a reasonable ability to predict the risk of in-hospital postoperative infection in this population. Prospective external validation of this new score is warranted to confirm its clinical applicability. Conclusions: In-hospital postoperative infection is a common complication after HT, affecting 35% of patients who underwent this procedure at our institution. Diabetes mellitus, previous sternotomy, preoperative mechanical ventilation, primary graft failure, major surgical bleeding, use of mycophenolate mofetil, and itraconazole were all independent clinical predictors of early postoperative infection after HT

    In-Hospital Post-Operative Infection after Heart Transplantation: Epidemiology, Clinical Management, and Outcome

    Get PDF
    Observational study[Abstract] Background: Infection is a major cause of morbidity and mortality after heart transplantation (HT). Little information about its importance in the immediate post-operative period is available. The aim of this study was to analyze the characteristics, incidence, and outcomes of in-hospital post-operative infections after HT. Methods: We conducted an observational, single-center study based on 677 adults who underwent HT from 1991 to 2015 and who survived the surgical intervention. In-hospital post-operative infections were identified retrospectively according to the medical finding in the clinical records. Results: Over a mean hospital stay of 24.5 days, 239 patients (35.3%) developed 348 episodes of infection (2 episodes per 100 patient-days). The most common sources of infection were those related to invasive procedures (respiratory infections, 115 [33%]; urinary tract infections, 47 [13.5%]; bacteremia, 42 [12.1%]; surgical site infections, 25 [7.2%]), in addition to abdominal focus (33, 9.5%). Enterobacteriaceae (76, 21.8%) and gram-positive cocci (58, 16.7%) were the predominant germs, although opportunistic infections were not infrequent (69, 19.8%). Ninety-five septic episodes were detected with a mean Sequential Organ Failure Assessment Score of 9.5 ± 5.3 points, with hemodynamic failure being the most severe organ dysfunction and renal dysfunction the most frequent one. Management included broad-spectrum antibiotics in 48.8% of episodes and surgical management in 13.8%. The overall antimicrobial success rate was 96.3%. Higher in-hospital mortality was observed among infected patients (15.1% vs. 10.3%), but this difference was not statistically significant (p = 0.067). The one-year survival and events were not different between patients suffering from a post-operative infection and those who did not. Conclusions: In-hospital infections were frequent in the post-operative period after HT and were associated with a poor short-term outcome. Patients who survived sepsis had a similar one-year morbidity and mortality compared with patients who did not develop an infection

    Three Pseudomonas putida FNR Family Proteins with Different Sensitivities to O-2

    Get PDF
    The Escherichia coli fumarate-nitrate reduction regulator (FNR) protein is the paradigm for bacterial O2-sensing transcription factors. However, unlike E. coli, some bacterial species possess multiple FNR proteins that presumably have evolved to fulfill distinct roles. Here, three FNR proteins (ANR, PP_3233, and PP_3287) from a single bacterial species, Pseudomonas putida KT2440, have been analyzed. Under anaerobic conditions, all three proteins had spectral properties resembling those of [4Fe-4S] proteins. The reactivity of the ANR [4Fe-4S] cluster with O2 was similar to that of E. coli FNR, and during conversion to the apo-protein, via a [2Fe-2S] intermediate, cluster sulfur was retained. Like ANR, reconstituted PP_3233 and PP_3287 were converted to [2Fe-2S] forms when exposed to O2, but their [4Fe-4S] clusters reacted more slowly. Transcription from an FNR-dependent promoter with a consensus FNR-binding site in P. putida and E. coli strains expressing only one FNR protein was consistent with the in vitro responses to O2. Taken together, the experimental results suggest that the local environments of the iron-sulfur clusters in the different P. putida FNR proteins influence their reactivity with O2, such that ANR resembles E. coli FNR and is highly responsive to low concentrations of O2, whereas PP_3233 and PP_3287 have evolved to be less sensitive to O2

    Genome-wide significant association with seven novel multiple sclerosis risk loci

    Get PDF
    Objective: A recent large-scale study in multiple sclerosis (MS) using the ImmunoChip platform reported on 11 loci that showed suggestive genetic association with MS. Additional data in sufficiently sized and independent data sets are needed to assess whether these loci represent genuine MS risk factors. Methods: The lead SNPs of all 11 loci were genotyped in 10 796 MS cases and 10 793 controls from Germany, Spain, France, the Netherlands, Austria and Russia, that were independent from the previously reported cohorts. Association analyses were performed using logistic regression based on an additive model. Summary effect size estimates were calculated using fixed-effect meta-analysis. Results: Seven of the 11 tested SNPs showed significant association with MS susceptibility in the 21 589 individuals analysed here. Meta-analysis across our and previously published MS case-control data (total sample size n=101 683) revealed novel genome-wide significant association with MS susceptibility (p<5×10−8) for all seven variants. This included SNPs in or near LOC100506457 (rs1534422, p=4.03×10−12), CD28 (rs6435203, p=1.35×10−9), LPP (rs4686953, p=3.35×10−8), ETS1 (rs3809006, p=7.74×10−9), DLEU1 (rs806349, p=8.14×10−12), LPIN3 (rs6072343, p=7.16×10−12) and IFNGR2 (rs9808753, p=4.40×10−10). Cis expression quantitative locus effects were observed in silico for rs6435203 on CD28 and for rs9808753 on several immunologically relevant genes in the IFNGR2 locus. Conclusions: This study adds seven loci to the list of genuine MS genetic risk factors and further extends the list of established loci shared across autoimmune diseases

    Genome-wide significant association with seven novel multiple sclerosis risk loci

    Get PDF
    Objective: A recent large-scale study in multiple sclerosis (MS) using the ImmunoChip platform reported on 11 loci that showed suggestive genetic association with MS. Additional data in sufficiently sized and independent data sets are needed to assess whether these loci represent genuine MS risk factors. Methods: The lead SNPs of all 11 loci were genotyped in 10 796 MS cases and 10 793 controls from Germany, Spain, France, the Netherlands, Austria and Russia, that were independent from the previously reported cohorts. Association analyses were performed using logistic regression based on an additive model. Summary effect size estimates were calculated using fixed-effect meta-analysis. Results: Seven of the 11 tested SNPs showed significant association with MS susceptibility in the 21 589 individuals analysed here. Meta-analysis across our and previously published MS case-control data (total sample size n=101 683) revealed novel genome-wide significant association with MS susceptibility (p<5×10−8) for all seven variants. This included SNPs in or near LOC100506457 (rs1534422, p=4.03×10−12), CD28 (rs6435203, p=1.35×10−9), LPP (rs4686953, p=3.35×10−8), ETS1 (rs3809006, p=7.74×10−9), DLEU1 (rs806349, p=8.14×10−12), LPIN3 (rs6072343, p=7.16×10−12) and IFNGR2 (rs9808753, p=4.40×10−10). Cis expression quantitative locus effects were observed in silico for rs6435203 on CD28 and for rs9808753 on several immunologically relevant genes in the IFNGR2 locus. Conclusions: This study adds seven loci to the list of genuine MS genetic risk factors and further extends the list of established loci shared across autoimmune diseases

    Statistical Methods Development for the Multiomic Systems Biology

    Full text link
    [ES] La investigación en Biología de Sistemas se ha expandido en los últimos años. El análisis simultáneo de diferentes tipos de datos ómicos permite el estudio de las conexiones y relaciones entre los diferentes niveles de organización celular. La presente tesis doctoral tiene como objetivo desarrollar y aplicar estrategias de integración multiómica al campo de la biología de sistemas. El elevado coste de las tecnologías ómicas, dificulta que los laboratorios puedan abordar un estudio multiómico completo. No obstante, la gran disponibilidad de datos ómicos en repositorios públicos, permite el uso de estos datos ya generados. Desafortunadamente, la combinación de datos ómicos provenientes de diferentes orígenes, da lugar a la aparición de un ruido no deseado en los datos, el efecto lote. El efecto lote impide el correcto análisis conjunto de los datos y es necesario el uso de los llamados Algoritmos de Corrección de Efecto Lote para eliminarlo. En la actualidad, existe un gran número de éstos algoritmos que se basan en diferentes modelos estadísticos. Sin embargo, los métodos existentes no están pensados para los diseños multiómicos ya que solo permiten la corrección de un mismo tipo de ómica que debe haber sido medida en todos los lotes. Por ello desarrollamos la herramienta MultiBaC basada en la regresión PLS y modelos ANOVA-SCA, que permite la corrección del efecto lote en diseños multiómicos, permitiendo la corrección de datos que no hayan sido medidos en todos los lotes. En este trabajo, MultiBaC fué validado y evaluado en diferentes conjuntos de datos, además presentamos MultiBaC como paquete de R para facilitar su uso. La mayoría de métodos existentes de integración multiómica son métodos multivariantes basados en el análisis del espacio latente. Estos métodos se conocen como ``dirigidos por datos'', y se basan en la búsqueda de correlaciones para determinar las relaciones entre las variables. Estos métodos necesitan de gran cantidad de observaciones o muestras para poder encontrar correlaciones significativas. Lamentablemente, en el mundo de la biología molecular, los conjuntos de datos con un gran número de muestras no son muy habituales, debido al elevado coste de generación de los datos. Como alternativa a los métodos dirigidos por datos, algunas estrategias de integración multiómicas se basan en métodos ``dirigidos por modelos''. Estos métodos pueden ajustarse con un menor número de observaciones y son muy útiles para encontrar relaciones mecanísticas entre los componentes celulares. Los métodos dirigidos por modelos necesitan de una información a priori, el modelo, que normalmente es un modelo metabólico del organismo estudiado. Actualmente, sólo transcriptómica y metabolómica cuantitativa, han sido los dos tipos de dato ómico que se han integrado con éxito usando métodos dirigidos por modelos.Sin embargo, la metabolómica cuantitativa no está muy extendida y la mayoría de laboratorios generan metabolómica no cuantitativa, la cuál no puede integrarse con los métodos actuales. Para contribuir en esta cuestión, desarrollamos MAMBA, una herramienta de integración multiómica dirigida por modelos y basada en métodología de optimización matemática, que es capaz de analizar conjuntamente metabolómica no cuantitativa con otro tipo de ómica asociada a genes, como por ejemplo la trascriptómica. MAMBA fue comparado con otros métodos existentes en cuanto a la capacidad de predcción de metabolitos y fué aplicado al conjunto interno de datos multiómicos. Este conjunto de datos multiómicos fue generado dentro del proyecto PROMETEO, en el cuál está enmarcada esta tesis. MAMBA demostró capturar la biología conocida sobre nuestro diseño experimental, además de ser útil para derivar nuevas observaciones e hipótesis biológicas. En conjunto, esta tesis presenta herramientas útiles para el campo de la biología de sistemas, y que cubren tanto el preprocesado de datos multiómicos como su posterior análisis estadístico integrativo.[CA] La investigació en Biologia de Sistemes s'ha expandit els darrers. L'anàlisi simultània de diferents tipus de dades òmiques permet l'estudi de les connexions i les relacions entre els diferents nivells d'organització cel·lular. Aquesta tesi doctoral té com a objectiu desenvolupar i aplicar estratègies dintegració multiòmica al camp de la biologia de sistemes. L'elevat cost de les tecnologies òmiques dificulta que els laboratoris puguin abordar un estudi multiòmic complet. Això no obstant, la gran disponibilitat de dades òmiques en repositoris públics permet l'ús d'aquestes dades ja generades. Malauradament, la combinació de dades òmiques provinents de diferents orígens, dóna lloc a l'aparició d'un soroll no desitjat en les dades, l'efecte lot. L'efecte lot impedeix la correcta anàlisi conjunta de les dades i cal utilitzar els anomenats algorismes de correcció d'Efecte lot per eliminar-lo. Actualment hi ha un gran nombre d'aquests algorismes que corregeixen l'efecte lot que es basen en diferents models estadístics. Tot i això, els mètodes existents no estan pensats per als dissenys multiòmics ja que només permeten la correcció d'un mateix tipus de dada òmica que ha d'haver estat mesurada en tots els lots. Per això desenvolupem la nostra eina MultiBaC basada en la regressió PLS i models ANOVA-SCA, que pot corregir l'efecte lot en dissenys multiòmics, permetent la correcció de dades que no hagin estat mesurades a tots els lots. En aquest treball, MultiBaC ha sigut validat i avaluat en diferents conjunts de dades, a més a més, presentem MultiBaC com a paquet de R per facilitar l'ús de la nostra eina. La majoria de mètodes d'integració multiòmica existents són mètodes multivariants basats en l'anàlisi de l'espai latent. Aquests mètodes es coneixen com a "dirigits per dades", i es basen en la cerca de correlacions per determinar les relacions entre les diferents variables. Els mètodes dirigits per dades necessiten gran quantitat d'observacions o mostres per poder trobar correlacions significatives entre les variables. Lamentablement, al món de la biologia molecular, els conjunts de dades amb un gran nombre de mostres no són molt habituals, degut a l'elevat cost de generació de les dades òmiques. Com a alternativa als mètodes dirigits per dades, algunes estratègies d'integració multiòmiques es basen en mètodes "dirigits per models". Aquests mètodes poden ajustar-se amb un nombre menor d'observacions i són molt útils per trobar relacions mecanístiques entre els components cel·lulars. Tot i això, els mètodes dirigits per models necessiten una informació a priori, el model, que normalment és un model metabòlic de l'organisme estudiat. Actualment, únicament transcriptòmica i metabolòmica quantitativa, han estat els dos tipus de dada òmica que s'han integrat amb èxit usant mètodes dirigits per models. No obstant això, la metabolòmica quantitativa no està gaire estesa i la majoria de laboratoris generen metabolòmica no quantitativa, les quals no es poden integrar amb els mètodes actuals. Per contribuir en aquesta qüestió, hem desenvolupat MAMBA, una eina d'integració multiòmica dirigida per models i basada en la metodologia d'optimització matemàtica, que és capaç d'analitzar conjuntament metabolòmica no quantitativa amb un altre tipus d'òmica associada a gens, com per exemple la trascriptòmica. MAMBA va ser comparat amb altres mètodes existents quant a la capacitat de predcció de metabòlits i va ser aplicat al conjunt intern de dades multiòmiques. Aquest conjunt de dades multiòmiques va ser generat dins del projecte PROMETEO, en el qual està emmarcada aquesta tesi. Es demostra que MAMBA capturar la biologia coneguda sobre el nostre disseny experimental, a més de ser útil per derivar noves observacions i hipòtesis biològiques. En conjunt, aquesta tesi presenta eines útils per al camp de la biologia de sistemes, i que cobreixen tant el preprocessament de dades multiòmiques com la seua posterior anàlisi estadística integrativa.[EN] Systems Biology research has expanded over the last years together with the development of omic technologies. The combination and simultaneous analysis of different kind of omic data allows the study of the connections and relationships between different cellular layers. Indeed, multiomic integration strategies provides a key source of knowledge about the cell as a system. The present Ph.D. thesis aims to study, develop and apply multiomic integration approaches to the field of systems biology. The still high cost of omics technologies makes it difficult for most laboratories to afford a complete multiomic study. However, the wide availability of omic data in public repositories allows the use of these already generated data. Unfortunately, the combination of omic data from different sources provokes the appearance of unwanted noise in data, known as batch effect. Batch effect impairs the correct integrative analysis of the data. Therefore, the use of so-called Batch Effect Correction Algorithms is necessary. As of today, there is a large number of such algorithms based on different statistical models and methods that correct batch effect and are part of the data pre-processing steps. However, the existing methods are not intended for multi-omics designs as they only allow the correction of the same type of omic data that must be measured across all batches. For this reason, we developed MultiBaC algorithm, which removes batch effect in multiomic designs, allowing the correction of data that are not measured across all batches. MultiBaC is based on PLS regression and ANOVA-SCA models and was validated and evaluated on different datasets. We also present MultiBaC as an R package to facilitate the use of this tool. Most existing multiomic integration approaches are multivariate methods based on latent space analysis. These methods are known as data-driven as they are based on the search for correlations to determine the relationships between the different variables. Data-driven methods require a large number of observations or samples to find robust and/or significant correlations among features. Unfortunately, in the molecular biology field, data sets with a large number of samples are not very common, again due to the high cost of generating omic data. As an alternative to data-driven methods, some multiomic integration strategies are based on model-driven approaches. These methods can be fitted with a smaller number of observations and are very useful for finding mechanistic relationships between different cellular components. However, model-driven methods require a priori information, which is usually a metabolic model of the organism under study. Currently, only transcriptomics and quantitative metabolomics have been successfully integrated using model-driven methods. Nonetheless, quantitative metabolomics is not very widespread and most laboratories generate non-quantitative or semi-quantitative metabolomics, which cannot be integrated with current methods. To address this issue, we developed MAMBA, a model-driven multiomic integration method that relies on mathematical optimization problems and is able to jointly analyze non-quantitative or semi-quantitative metabolomics with other types of gene-centric omic data, such as transcriptomics. MAMBA was compared to other existing methods in terms of metabolite prediction accuracy and was applied to a multiomic dataset generated within the PROMETEO project, in which this thesis is framed. MAMBA proved to capture the known biology of our experimental design and was useful for deriving new findings and biological hypotheses. Altogether, this thesis presents useful tools for the field of systems biology, covering both the pre-processing of multiomic datasets and their subsequent statistical integrative analysis.Ugidos Guerrero, M. (2023). Statistical Methods Development for the Multiomic Systems Biology [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/19303

    MultiBaC: an R package to remove batch effects in multi-omic experiments

    Full text link
    [EN] Motivation: Batch effects in omics datasets are usually a source of technical noise that masks the biological signal and hampers data analysis. Batch effect removal has been widely addressed for individual omics technologies. However, multi-omic datasets may combine data obtained in different batches where omics type and batch are often confounded. Moreover, systematic biases may be introduced without notice during data acquisition, which creates a hidden batch effect. Current methods fail to address batch effect correction in these cases. Results: In this article, we introduce the MultiBaC R package, a tool for batch effect removal in multi-omics and hidden batch effect scenarios. The package includes a diversity of graphical outputs for model validation and assessment of the batch effect correction.This work was funded by the Generalitat Valenciana through PROMETEO grants program for excellence research groups [PROMETEO 2016/093] and by the Spanish MICINN [PID2020-119537RB-I00]. Funding for open access charge: Universitat Politecnica de Valencia.Ugidos, M.; Nueda, MJ.; Prats-Montalbán, JM.; Ferrer, A.; Conesa, A.; Tarazona, S. (2022). MultiBaC: an R package to remove batch effects in multi-omic experiments. Bioinformatics. 38(9):2657-2658. https://doi.org/10.1093/bioinformatics/btac1322657265838
    corecore